Skip to content

fix(heap_table): TupleId sentinel bug and 28 unit tests#29

Merged
poyrazK merged 3 commits intomainfrom
test/raft-group-coverage
Apr 13, 2026
Merged

fix(heap_table): TupleId sentinel bug and 28 unit tests#29
poyrazK merged 3 commits intomainfrom
test/raft-group-coverage

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Apr 13, 2026

Summary

  • Fix TupleId is_null() treating valid (0,0) as null (sentinel was 0, fixed to UINT32_MAX)
  • Fix get_meta() not checking xmax, returning logically deleted tuples
  • Fix fetch_page_by_id() returning zeroed page beyond EOF (infinite loop bug)
  • Add 28 unit tests for HeapTable covering insert, update, delete, scan, MVCC
  • Ignore _deps/ build artifact directory

Test plan

  • All 27 existing tests pass
  • All 28 new heap_table_tests pass

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling for failed page reads in the storage layer; failed acquisitions are now properly rolled back instead of leaving inconsistent state.
  • Tests

    • Added comprehensive test suite for heap table storage functionality, including MVCC semantics, tuple insertion, updates, deletion, and scanning operations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Warning

Rate limit exceeded

@github-actions[bot] has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 48 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 48 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 75301ae2-0e07-4b51-9e18-d77082228414

📥 Commits

Reviewing files that changed from the base of the PR and between 2fd70a4 and 615fa1c.

📒 Files selected for processing (5)
  • .gitignore
  • CMakeLists.txt
  • include/storage/heap_table.hpp
  • src/storage/heap_table.cpp
  • tests/heap_table_tests.cpp
📝 Walkthrough

Walkthrough

This PR extends heap table storage with MVCC support and improves error handling in the buffer pool manager. Changes include updating the TupleId null sentinel from (0, 0) to (UINT32_MAX, 0), adding logical deletion support by checking xmax values, refining buffer pool read failure handling to roll back frame acquisition, and introducing a comprehensive test suite validating heap table CRUD operations, MVCC semantics, and transaction-ID handling.

Changes

Cohort / File(s) Summary
Build Configuration
.gitignore, CMakeLists.txt
Added _deps/ to ignore rules and registered new heap_table_tests executable in the test suite.
TupleId Semantics
include/storage/heap_table.hpp
Changed the null/invalid sentinel from (page_num=0, slot_num=0) to (page_num=UINT32_MAX, slot_num=0) in default constructor and is_null() method.
Buffer Pool Error Handling
src/storage/buffer_pool_manager.cpp
On storage_manager_.read_page() failure, the code now rolls back frame acquisition by unpinning the frame, clearing the page table mapping, and resetting frame metadata instead of zero-initializing the page data.
Logical Deletion Support
src/storage/heap_table.cpp
HeapTable::get_meta now treats tuples with non-zero xmax as logically deleted, returning false and unpinning the page without deserializing values.
Comprehensive Test Suite
tests/heap_table_tests.cpp
New GoogleTest file covering heap table construction, table lifecycle, tuple insertion/retrieval, MVCC update semantics, logical deletion via remove(), physical deletion via physical_remove(), scan behavior, iterator semantics, and type initialization/formatting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Stability & Testing Refinement #11: Modifies buffer pool manager's page-read/fetch logic in buffer_pool_manager.cpp, with potentially conflicting changes to read failure handling.

Poem

🐰 Hop along with deleted tuples in MVCC's dance,
Where UINT32_MAX marks the null at a glance,
Buffer frames now rollback when reads go astray,
Tests confirm each logical delete's gray way.
Heap tables leap forward, with xmax to sway!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing a TupleId sentinel bug and adding 28 unit tests for HeapTable, which aligns with the primary objectives of the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/raft-group-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/storage/heap_table.hpp`:
- Around line 40-44: The header changed the null sentinel for TupleId to use
page_num == UINT32_MAX, so replace all hard-coded invalid RIDs written as {0, 0}
with TupleId{} (default ctor) to preserve the null semantics; specifically
update the initialization of Iterator::last_id_ (seed value) and the code path
that returns an invalid RID on insert failure to return TupleId{} instead of
{0,0}, and grep for any other occurrences of literal {0,0} that represent “no
tuple” and fix them similarly.

In `@src/storage/buffer_pool_manager.cpp`:
- Around line 108-109: The frame is being made evictable via
replacer_.unpin(frame_id) and then immediately pushed into free_list_ causing
duplication; modify the logic in the function containing
replacer_.unpin(frame_id); free_list_.push_back(frame_id) so a frame is only
returned from one source: either remove the replacer_.unpin call when you intend
to move the frame to free_list_ (so only free_list_ holds it), or instead
check/avoid pushing frames already managed by the replacer (use the same
ownership path as unpin_page_by_id()). Ensure you update the code paths that
manipulate frame_id to use replacer_.unpin exclusively or free_list_.push_back
exclusively to prevent the frame being allocatable from two places.

In `@src/storage/heap_table.cpp`:
- Around line 546-550: The code in get_meta() is making get_meta() hide
logically-deleted tuples by checking out_meta.xmax and returning false, which
breaks callers that expect raw metadata (e.g., executor::operator uses
get_meta() and applies snapshot rules). Revert this: remove the deletion-check
early return in the get_meta() path (and keep the page unpin semantics correct)
so get_meta() continues to return raw meta regardless of xmax; then move the
deleted-tuple filtering into the visibility-aware read path (either
heap_table::get() or a new helper like is_visible_under_snapshot()) and update
callers (e.g., operator.cpp usage) to call that visibility helper or let get()
enforce plain-read hiding if desired.
- Around line 546-549: The fast-path early return unpins the page via
bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false) even when the page is
stored in cached_page_, leaving cached_page_ as a dangling/stale owned pointer;
change the logic in the block that checks out_meta.xmax != 0 to avoid unpinning
a cached_page_ (either skip bpm_.unpin_page_by_id when cached_page_ is set or
first clear/reset cached_page_ before unpinning) so cached_page_ no longer holds
a stale reference — update the code around cached_page_, bpm_.unpin_page_by_id,
file_id_, and tuple_id.page_num accordingly.

In `@tests/heap_table_tests.cpp`:
- Around line 345-366: The test reads HeapTable::TupleMeta fields after calling
table_->get_meta(rid, meta) without asserting the call succeeded; update the
tests (e.g., in TEST_F(HeapTableTests, MVCCUpdateTransactionId)) to explicitly
check the return of table_->get_meta(rid, meta) with
ASSERT_TRUE(table_->get_meta(rid, meta)) (or EXPECT_FALSE where appropriate)
before accessing meta.xmin/meta.xmax so a failing get_meta() causes the test to
fail immediately; locate usages of table_->get_meta and add the assertion
immediately after each call that is followed by direct reads of meta.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77dae9df-6913-4817-a122-79051e3bdd1f

📥 Commits

Reviewing files that changed from the base of the PR and between 805194e and 2fd70a4.

📒 Files selected for processing (6)
  • .gitignore
  • CMakeLists.txt
  • include/storage/heap_table.hpp
  • src/storage/buffer_pool_manager.cpp
  • src/storage/heap_table.cpp
  • tests/heap_table_tests.cpp

Comment thread include/storage/heap_table.hpp
Comment thread src/storage/buffer_pool_manager.cpp Outdated
Comment on lines +108 to +109
replacer_.unpin(frame_id);
free_list_.push_back(frame_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't put the same frame in both free_list_ and the replacer.

unpin_page_by_id() already uses replacer_.unpin() to make a zero-pin frame evictable again at Lines 146-149. Doing that here and then pushing the same frame_id into free_list_ makes the failed frame allocatable from two places, which can hand out the same frame twice after a read miss.

Suggested fix
-        replacer_.unpin(frame_id);
         free_list_.push_back(frame_id);
         page_table_.erase(key);
         page->file_name_.clear();
         page->page_id_ = 0;
         page->pin_count_ = 0;
+        page->is_dirty_ = false;
         return nullptr;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/buffer_pool_manager.cpp` around lines 108 - 109, The frame is
being made evictable via replacer_.unpin(frame_id) and then immediately pushed
into free_list_ causing duplication; modify the logic in the function containing
replacer_.unpin(frame_id); free_list_.push_back(frame_id) so a frame is only
returned from one source: either remove the replacer_.unpin call when you intend
to move the frame to free_list_ (so only free_list_ holds it), or instead
check/avoid pushing frames already managed by the replacer (use the same
ownership path as unpin_page_by_id()). Ensure you update the code paths that
manipulate frame_id to use replacer_.unpin exclusively or free_list_.push_back
exclusively to prevent the frame being allocatable from two places.

Comment on lines +546 to +550
/* Skip tuples that have been logically deleted */
if (out_meta.xmax != 0) {
bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false);
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep get_meta() as raw metadata access.

src/executor/operator.cpp Lines 200-228 still calls get_meta() and then applies snapshot visibility rules using meta.xmax. Returning false here makes deleted-but-still-visible versions disappear for older snapshots, and because the fallback path at Lines 598-677 does not do the same check, the result currently depends on whether the page happens to be cached. If plain reads should hide deleted tuples, move that policy into get() or a separate visibility-aware helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/heap_table.cpp` around lines 546 - 550, The code in get_meta() is
making get_meta() hide logically-deleted tuples by checking out_meta.xmax and
returning false, which breaks callers that expect raw metadata (e.g.,
executor::operator uses get_meta() and applies snapshot rules). Revert this:
remove the deletion-check early return in the get_meta() path (and keep the page
unpin semantics correct) so get_meta() continues to return raw meta regardless
of xmax; then move the deleted-tuple filtering into the visibility-aware read
path (either heap_table::get() or a new helper like is_visible_under_snapshot())
and update callers (e.g., operator.cpp usage) to call that visibility helper or
let get() enforce plain-read hiding if desired.

Comment on lines +546 to +549
/* Skip tuples that have been logically deleted */
if (out_meta.xmax != 0) {
bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false);
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not unpin cached_page_ from this fast path.

On the cached-page branch, cached_page_ stays stored on the HeapTable after this return. Dropping the pin here without clearing that cache leaves a stale owned pointer behind, so later inserts or destruction can unpin/reuse a frame the table no longer owns.

Suggested fix
         /* Skip tuples that have been logically deleted */
         if (out_meta.xmax != 0) {
-            bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false);
             return false;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/* Skip tuples that have been logically deleted */
if (out_meta.xmax != 0) {
bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false);
return false;
/* Skip tuples that have been logically deleted */
if (out_meta.xmax != 0) {
return false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/heap_table.cpp` around lines 546 - 549, The fast-path early
return unpins the page via bpm_.unpin_page_by_id(file_id_, tuple_id.page_num,
false) even when the page is stored in cached_page_, leaving cached_page_ as a
dangling/stale owned pointer; change the logic in the block that checks
out_meta.xmax != 0 to avoid unpinning a cached_page_ (either skip
bpm_.unpin_page_by_id when cached_page_ is set or first clear/reset cached_page_
before unpinning) so cached_page_ no longer holds a stale reference — update the
code around cached_page_, bpm_.unpin_page_by_id, file_id_, and tuple_id.page_num
accordingly.

Comment on lines +345 to +366
HeapTable::TupleMeta meta;
table_->get_meta(rid, meta);
EXPECT_EQ(meta.xmin, 100U);
EXPECT_EQ(meta.xmax, 0U);

table_->remove(rid, 200); // xmax = 200
table_->get_meta(rid, meta);
EXPECT_EQ(meta.xmax, 200U);
}

TEST_F(HeapTableTests, MVCCUpdateTransactionId) {
ASSERT_TRUE(table_->create());
auto tuple = make_test_tuple(1, "Original");
auto rid = table_->insert(tuple, 1);

auto new_tuple = make_test_tuple(1, "Updated");
table_->update(rid, new_tuple, 99);

// xmax should be set for old version, or new tuple has xmin=99
HeapTable::TupleMeta meta;
table_->get_meta(rid, meta);
EXPECT_TRUE(meta.xmin == 99 || meta.xmax == 99);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Assert the get_meta() result before reading meta.

These MVCC checks inspect meta after a second get_meta() call without first asserting whether that call succeeded. If get_meta() regresses and returns false, the test will read stale values from the previous successful call instead of failing at the real source. Please make the intended contract explicit with ASSERT_TRUE(...) or EXPECT_FALSE(...) before the field assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/heap_table_tests.cpp` around lines 345 - 366, The test reads
HeapTable::TupleMeta fields after calling table_->get_meta(rid, meta) without
asserting the call succeeded; update the tests (e.g., in TEST_F(HeapTableTests,
MVCCUpdateTransactionId)) to explicitly check the return of
table_->get_meta(rid, meta) with ASSERT_TRUE(table_->get_meta(rid, meta)) (or
EXPECT_FALSE where appropriate) before accessing meta.xmin/meta.xmax so a
failing get_meta() causes the test to fail immediately; locate usages of
table_->get_meta and add the assertion immediately after each call that is
followed by direct reads of meta.

- Fix TupleId is_null() treating valid (0,0) as null (sentinel was 0, fixed to UINT32_MAX)
- Fix get_meta() not checking xmax, returning logically deleted tuples
- Fix fetch_page_by_id() returning zeroed page beyond EOF (infinite loop bug)
- Add 28 unit tests for HeapTable covering insert, update, delete, scan, MVCC
- Ignore _deps/ build artifact directory
@poyrazK poyrazK force-pushed the test/raft-group-coverage branch from 2fd70a4 to f579051 Compare April 13, 2026 16:21
- Fix TupleId is_null() treating valid (0,0) as null (sentinel was 0, fixed to UINT32_MAX)
- Fix get_meta() not checking xmax, returning logically deleted tuples
- Fix default struct initialization for thread sanitizer compatibility
- Add 28 unit tests for HeapTable covering insert, update, delete, scan, MVCC
- Ignore _deps/ build artifact directory
@poyrazK poyrazK force-pushed the test/raft-group-coverage branch from 6cf1ce8 to 0f006ed Compare April 13, 2026 16:31
@poyrazK poyrazK merged commit 078c09b into main Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant